A Faster Grammar-Based Self-index

نویسندگان

  • Travis Gagie
  • Pawel Gawrychowski
  • Juha Kärkkäinen
  • Yakov Nekrich
  • Simon J. Puglisi
چکیده

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on grammars. In this paper we show how, given a straight-line program with r rules for a string S[1..n] whose LZ77 parse consists of z phrases, we can store a self-index for S in O(r + z log log n) space such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O ( m + occ log log n ) time. If the straight-line program is balanced and we accept a small probability of building a faulty index, then we can reduce the O ( m ) term to O(m logm). All previous self-indexes are larger or slower in the worst case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Tiny Structural Self-Indexes for XML

XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is co...

متن کامل

Comparing confidence-based and conventional scoring methods: The case of an English grammar class

This study aimed at investigating the reliability, predictive validity, and self-esteem and gender bias of confidence-based scoring. This is a method of scoring in which the test takers receive a positive or negative point based on their rating of their confidence in an answer. The participants, who were 49 English-major students taking their grammar course, were given 8 multiple-choice tests d...

متن کامل

Online Self-Indexed Grammar Compression

Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build ...

متن کامل

How Attitude, Self-efficacy, and Job Satisfaction Relate with Teaching Strategies?

The primary purpose of the present study was to explore whether there was any significant relationship between attitude, self-efficacy, and job satisfaction of Iranian EFL teachers on the one hand, and their choice of teaching strategies. Strategies mostly used by participants of the study with low, mid, and high levels of self-efficacy comprised another purpose of the study. To this end, a que...

متن کامل

Self-Indexed Grammar-Based Compression

Self-indexes aim at representing text collections in a compressed format that allows extracting arbitrary portions and also offers indexed searching on the collection. Current self-indexes are unable of fully exploiting the redundancy of highly repetitive text collections that arise in several applications. Grammar-based compression is well suited to exploit such repetitiveness. We introduce th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012